Sentence-Level Multilingual Multi-modal Embedding for Natural Language Processing

نویسندگان

  • Iacer Calixto
  • Qun Liu
چکیده

We propose a novel discriminative ranking model that learns embeddings from multilingual and multi-modal data, meaning that our model can take advantage of images and descriptions in multiple languages to improve embedding quality. To that end, we introduce an objective function that uses pairwise ranking adapted to the case of three or more input sources. We compare our model against different baselines, and evaluate the robustness of our embeddings on image–sentence ranking (ISR), semantic textual similarity (STS), and neural machine translation (NMT). We find that the additional multilingual signals lead to improvements on all three tasks, and we highlight that our model can be used to consistently improve the adequacy of translations generated with NMT models when re-ranking n-best lists.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilingual Multi-modal Embeddings for Natural Language Processing

We propose a novel discriminative model that learns embeddings from multilingual and multi-modal data, meaning that our model can take advantage of images and descriptions in multiple languages to improve embedding quality. To that end, we introduce a modification of a pairwise contrastive estimation optimisation function as our training objective. We evaluate our embeddings on an image–sentenc...

متن کامل

Deep Fragment Embeddings for Bidirectional Image Sentence Mapping

We introduce a model for bidirectional retrieval of images and sentences through a deep, multi-modal embedding of visual and natural language data. Unlike previous models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of images (objects) and fragments of sentences (typed dependency tree relations) into a common space. W...

متن کامل

ExB Text Summarizer

We present our state of the art multilingual text summarizer capable of single as well as multi-document text summarization. The algorithm is based on repeated application of TextRank on a sentence similarity graph, a bag of words model for sentence similarity and a number of linguistic preand post-processing steps using standard NLP tools. We submitted this algorithm for two different tasks of...

متن کامل

MMCR4NLP: Multilingual Multiway Corpora Repository for Natural Language Processing

Multilinguality is gradually becoming ubiquitous in the sense that more and more researchers have successfully shown that using additional languages help improve the results in many Natural Language Processing tasks. Multilingual Multiway Corpora (MMC) contain the same sentence in multiple languages. Such corpora have been primarily used for Multi-Source and Pivot Language Machine Translation b...

متن کامل

Different Approaches to Build Multilingual Conversational Systems

The paper describes developments and results of the work being carried out during the European research project CATCH-2004 (Converse in AThens Cologne and Helsinki). The objective of the project is multi-modal, multi-lingual conversational access to information systems. This paper concentrates on issues of the multilingual telephony-based speech and natural language understanding components.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017